Navigating duplication in pharmacovigilance databases: a scoping review

Objectives Pharmacovigilance databases play a critical role in monitoring drug safety. The duplication of reports in pharmacovigilance databases, however, undermines their data integrity. This scoping review sought to provide a comprehensive understanding of duplication in pharmacovigilance databases worldwide. Design A scoping review. Data sources Reviewers comprehensively searched the literature in PubMed, Web of Science, Wiley Online Library, EBSCOhost, Google Scholar and other relevant websites. Eligibility criteria Peer-reviewed publications and grey literature, without language restriction, describing duplication and/or methods relevant to duplication in pharmacovigilance databases from inception to 1 September 2023. Data extraction and synthesis We used the Joanna Briggs Institute guidelines for scoping reviews and conformed with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews. Two reviewers independently screened titles, abstracts and full texts. One reviewer extracted the data and performed descriptive analysis, which the second reviewer assessed. Disagreements were resolved by discussion and consensus or in consultation with a third reviewer. Results We screened 22 745 unique titles and 156 were eligible for full-text review. Of the 156 titles, 58 (47 peer-reviewed; 11 grey literature) fulfilled the inclusion criteria for the scoping review. Included titles addressed the extent (5 papers), prevention strategies (15 papers), causes (32 papers), detection methods (25 papers), management strategies (24 papers) and implications (14 papers) of duplication in pharmacovigilance databases. The papers overlapped, discussing more than one field. Advances in artificial intelligence, particularly natural language processing, hold promise in enhancing the efficiency and precision of deduplication of large and complex pharmacovigilance databases. Conclusion Duplication in pharmacovigilance databases compromises risk assessment and decision-making, potentially threatening patient safety. Therefore, efficient duplicate prevention, detection and management are essential for more reliable pharmacovigilance data. To minimise duplication, consistent use of worldwide unique identifiers as the key case identifiers is recommended alongside recent advances in artificial intelligence.

duplication, from 66% to 87%, was observed in the FAERS database for reports from the literature' is almost entirely due to replication enforced by the obligations imposed by the Competent Authorities in MAHs reporting ICSRs from the literature.Equally notably (at lines 12 &13), the reference to duplication 'being highest for reports from published literature (11%)' in VigiBase is a direct reflection of the same phenomenon.
4. Discussion: The section requires updating based on the recognition of replication of case reports as an adjunct to duplication.As indicated above, replication is largely induced by regulatory reporting obligations, contractual agreements, and partnerships leading to the creation of case replicas.Resolution of this problem will require significant care, and the potential redesign of pharmacovigilance data management systems at a global level.
The authors should emphasize the importance of adhering to the principles set down in ICH E2B(R3) Section C.1.8concerning worldwide unique case identification.Provision has been made for the population of the field C.1.8.1 'Worldwide unique case identification number'.This is an important addition to the manuscript, as this single field (C.1.8.1) becomes, in effect, the primary key for case identification, and therefore a practical solution to reduce and manage both duplication and replication.
Further discussion of potential solutions is required.For example, please consider expanding on one or more of the following themes: -Standardized data entry to enable consistent recording of adverse event data, making it easier to identify duplicates [ICH E2B(R3), ICH E2D(R1)] -Regular data cleaning and deduplication -Assigning unique identifiers to patients and adverse events -Automated and semi-automated search criteria based on similarities in patient demographics (age, gender), unique identifiers, adverse reactions, and suspected medicines -Improved data validation and quality assurance processes -Collaboration between health facilities, pharmaceutical companies, and reporters -Improved data sharing between regulatory authorities and international databases 5. Recommendations: The authors should consider strong recommendations concerning the use of the field C.1.8.1 (as defined within the ICH E2B(R3) guideline).If this is used as the primary key for case identification, it becomes the pragmatic solution to both reduce and manage case duplication and replication.
6. Table 2 contains many typographical errors and is poorly formatted.It also contains two overtly promotional claims (p38, lines 6 to 9) concerning a commercial database product, and a pharmaceutical company.This text should be removed if it is intended to published Table 2 as an annex.

REVIEWER
Simmering, Jacob The University of Iowa College of Pharmacy REVIEW RETURNED 18-Dec-2023

GENERAL COMMENTS
Kiguba and coauthors present a paper summarizing the frequency of duplication -repeated entry of the same adverse drug event -in multiple pharmacovigilance data resources as well as a very highlevel overview of how data duplication can be addressed.Recommendations based on this review are efforts to improve data entry accuracy (better data entry will always outperform data cleaning), checking at the time of data entry for data duplication, and a few tools (like Ablebits) that may be suitable for automated deduplication.
I am sympathetic to the problem of data duplication.As a health services researcher using administrative health records finding and resolving duplication is a significant part of my professional work.I understand that data duplication can make estimating incidenceand therefore risk -difficult but the other limitations (cost, resource wastage) seem minor compared to the costs and complexity of a program like those run by the FDA, EMA, or WHO.A little more detail on *why* data duplication is problematic in the introduction and conclusion would perhaps "sell" me more on the critical importance of this idea.

Specific comments:
- The authors should clarify the distinction between duplication and replication in this section.For example, it seems clear that the 'very high level of duplication, from 66% to 87%, was observed in the FAERS database for reports from the literature' is almost entirely due to replication enforced by the obligations imposed by the Competent Authorities in MAHs reporting ICSRs from the literature.Equally notably (at lines 12 &13), the reference to duplication 'being highest for reports from published literature (11%)' in VigiBase is a direct reflection of the same phenomenon.
Response 4: The manuscript has been updated.See page 10.Thank you.
Comment 5: Discussion: The section requires updating based on the recognition of replication of case reports as an adjunct to duplication.As indicated above, replication is largely induced by regulatory reporting obligations, contractual agreements, and partnerships leading to the creation of case replicas.Resolution of this problem will require significant care, and the potential redesign of pharmacovigilance data management systems at a global level.
Response 5: The manuscript has been updated.See page 14.Thank you.
Comment 6: Discussion: The authors should emphasize the importance of adhering to the principles set down in ICH E2B(R3) Section C.1.8concerning worldwide unique case identification.Provision has been made for the population of the field C.1.8.1 'Worldwide unique case identification number'.This is an important addition to the manuscript, as this single field (C.1.8.1) becomes, in effect, the primary key for case identification, and therefore a practical solution to reduce and manage both duplication and replication.
Response 6: The manuscript has been updated, see pages 14-15.Thank you.
Comment 7: Discussion: Further discussion of potential solutions is required.For example, please consider expanding on one or more of the following themes: Standardized data entry to enable consistent recording of adverse event data, making it easier to identify duplicates [ICH E2B(R3), ICH E2D(R1)]; Regular data cleaning and deduplication; Assigning unique identifiers to patients and adverse events; Automated and semi-automated search criteria based on similarities in patient demographics (age, gender), unique identifiers, adverse reactions, and suspected medicines; Improved data validation and quality assurance processes; Collaboration between health facilities, pharmaceutical companies, and reporters; Improved data sharing between regulatory authorities and international databases Response 7: The manuscript has been updated.See page 14-15.Thank you.
General comment: This manuscript provides a comprehensive review of the occurrence of duplication of individual case safety reports in pharmacovigilance.It is well-written, balanced and offers a valuable update on this important topic.Response 1: Thank you.Please provide a definition of duplication of ICSRs and add context by highlighting the differences between duplication and replication of ICSRs.Reference 78 provides important insights on the distinction between these two areas.Response 3: Done.See page 5 of the proposal with track changes.Thank you.
Major 1. Need more detail on how the papers defined duplication in your results.2. I'm not sure what "reports in the published literature" means?Duplication rates are higher but I have no idea what it means (see page 11, lines 10-14) 3. It appears duplication rates also vary by medication, at least in the FDA data.Is this the case and why might it be the case?